Background

Brief

This notebook is provided as a guide before creating interactive plotting in Shiny Web App. In this notebook, I will use the New York City Airbnb Open Data obtained from Kaggle. This dataset describes the listing activity and metrics in New York City in 2019. The goal is to create an interactive Shiny dashboard. All the necessary operations such as data cleaning and initial visualization will first be performed in this notebook.

Libraries and Setup

These following packages are required in this notebook. Use install.packages() to install any packages that are not already downloaded and load them using library() function. I provided a brief explanation about their function.

  • tidyverse: data transformation
  • plotly: interactive plotting
  • glue: providing customized information in interactive plotting
  • scale: customizing axis in interactive plotting
  • lubridate: handling datetime data
  • leaflet: creating interactive map

Packages for Shiny App:

  • shiny: creating Shiny App
  • shinydashboard: creating Shiny Dashboard
  • shinyWidgets: allow using Shiny Dashboard function in Shiny App
  • shinythemes: applying Shiny themes
library(tidyverse)
library(glue)
library(scales)
library(plotly)
library(lubridate)
library(leaflet)

Data Preparation

Data Inspection

ab_nyc <- read.csv("data_input/AB_NYC_2019.csv")
head(ab_nyc)
##     id                                             name host_id   host_name
## 1 2539               Clean & quiet apt home by the park    2787        John
## 2 2595                            Skylit Midtown Castle    2845    Jennifer
## 3 3647              THE VILLAGE OF HARLEM....NEW YORK !    4632   Elisabeth
## 4 3831                  Cozy Entire Floor of Brownstone    4869 LisaRoxanne
## 5 5022 Entire Apt: Spacious Studio/Loft by central park    7192       Laura
## 6 5099        Large Cozy 1 BR Apartment In Midtown East    7322       Chris
##   neighbourhood_group neighbourhood latitude longitude       room_type price
## 1            Brooklyn    Kensington 40.64749 -73.97237    Private room   149
## 2           Manhattan       Midtown 40.75362 -73.98377 Entire home/apt   225
## 3           Manhattan        Harlem 40.80902 -73.94190    Private room   150
## 4            Brooklyn  Clinton Hill 40.68514 -73.95976 Entire home/apt    89
## 5           Manhattan   East Harlem 40.79851 -73.94399 Entire home/apt    80
## 6           Manhattan   Murray Hill 40.74767 -73.97500 Entire home/apt   200
##   minimum_nights number_of_reviews last_review reviews_per_month
## 1              1                 9  2018-10-19              0.21
## 2              1                45  2019-05-21              0.38
## 3              3                 0                            NA
## 4              1               270  2019-07-05              4.64
## 5             10                 9  2018-11-19              0.10
## 6              3                74  2019-06-22              0.59
##   calculated_host_listings_count availability_365
## 1                              6              365
## 2                              2              355
## 3                              1              365
## 4                              1              194
## 5                              1                0
## 6                              1              129
str(ab_nyc)
## 'data.frame':    48895 obs. of  16 variables:
##  $ id                            : int  2539 2595 3647 3831 5022 5099 5121 5178 5203 5238 ...
##  $ name                          : chr  "Clean & quiet apt home by the park" "Skylit Midtown Castle" "THE VILLAGE OF HARLEM....NEW YORK !" "Cozy Entire Floor of Brownstone" ...
##  $ host_id                       : int  2787 2845 4632 4869 7192 7322 7356 8967 7490 7549 ...
##  $ host_name                     : chr  "John" "Jennifer" "Elisabeth" "LisaRoxanne" ...
##  $ neighbourhood_group           : chr  "Brooklyn" "Manhattan" "Manhattan" "Brooklyn" ...
##  $ neighbourhood                 : chr  "Kensington" "Midtown" "Harlem" "Clinton Hill" ...
##  $ latitude                      : num  40.6 40.8 40.8 40.7 40.8 ...
##  $ longitude                     : num  -74 -74 -73.9 -74 -73.9 ...
##  $ room_type                     : chr  "Private room" "Entire home/apt" "Private room" "Entire home/apt" ...
##  $ price                         : int  149 225 150 89 80 200 60 79 79 150 ...
##  $ minimum_nights                : int  1 1 3 1 10 3 45 2 2 1 ...
##  $ number_of_reviews             : int  9 45 0 270 9 74 49 430 118 160 ...
##  $ last_review                   : chr  "2018-10-19" "2019-05-21" "" "2019-07-05" ...
##  $ reviews_per_month             : num  0.21 0.38 NA 4.64 0.1 0.59 0.4 3.47 0.99 1.33 ...
##  $ calculated_host_listings_count: int  6 2 1 1 1 1 1 1 1 4 ...
##  $ availability_365              : int  365 355 365 194 0 129 0 220 0 188 ...

Some information about the features:

  • id: listing ID
  • name: name of the listing
  • host_id: host ID
  • host_name: name of the host
  • neighbourhood_group: location
  • neighbourhood: area
  • latitude: latitude coordinates
  • longitude: longitude coordinates
  • room_type: listing space type
  • price: price in dollars
  • minimum_nights: amount of minimum nights
  • number_of_reviews: number of reviews
  • last_review: latest review
  • reviews_per_month: number of reviews per month
  • calculated_host_listings_count: amount of listings per host
  • availability_365: number of days when listing is available for booking

Data Cleaning

By inspecting the data, we know that some of the features’ types are incorrect and there are some missing values in the data. Furthermore, we know that we do not need the IDs in data visualization. So, I am going to drop id and host_id beforehand.

ab_nyc <- ab_nyc %>% 
  select(-c(id, host_id))

The term neighbourhood_group seems confusing. So, I will change it to borough instead.

ab_nyc <- ab_nyc %>% 
  rename(borough = neighbourhood_group)

Notice that in room_type, there is a value called Entire home/apt. Later on, when we are providing text for the interactive plot, this value will be displayed. I will change it first into a more preferable format.

unique(ab_nyc$room_type)
## [1] "Private room"    "Entire home/apt" "Shared room"
ab_nyc <- ab_nyc %>% 
  mutate(room_type = recode(room_type, 
                            "Entire home/apt" = "Entire Home/Apartment",
                            "Private room" = "Private Room",
                            "Shared room" = "Shared Room"))

Feature Type

We need to convert these following features’ types: * borough: Categorical * neighbourhood: Categorical * room_type: categorical * last_review: Date

ab_nyc <- ab_nyc %>% 
  mutate(across(c(borough, neighbourhood, room_type),
                factor),
         last_review = ymd(last_review))

head(ab_nyc)
##                                               name   host_name   borough
## 1               Clean & quiet apt home by the park        John  Brooklyn
## 2                            Skylit Midtown Castle    Jennifer Manhattan
## 3              THE VILLAGE OF HARLEM....NEW YORK !   Elisabeth Manhattan
## 4                  Cozy Entire Floor of Brownstone LisaRoxanne  Brooklyn
## 5 Entire Apt: Spacious Studio/Loft by central park       Laura Manhattan
## 6        Large Cozy 1 BR Apartment In Midtown East       Chris Manhattan
##   neighbourhood latitude longitude             room_type price minimum_nights
## 1    Kensington 40.64749 -73.97237          Private Room   149              1
## 2       Midtown 40.75362 -73.98377 Entire Home/Apartment   225              1
## 3        Harlem 40.80902 -73.94190          Private Room   150              3
## 4  Clinton Hill 40.68514 -73.95976 Entire Home/Apartment    89              1
## 5   East Harlem 40.79851 -73.94399 Entire Home/Apartment    80             10
## 6   Murray Hill 40.74767 -73.97500 Entire Home/Apartment   200              3
##   number_of_reviews last_review reviews_per_month
## 1                 9  2018-10-19              0.21
## 2                45  2019-05-21              0.38
## 3                 0        <NA>                NA
## 4               270  2019-07-05              4.64
## 5                 9  2018-11-19              0.10
## 6                74  2019-06-22              0.59
##   calculated_host_listings_count availability_365
## 1                              6              365
## 2                              2              355
## 3                              1              365
## 4                              1              194
## 5                              1                0
## 6                              1              129

Missing Values

colSums(is.na(ab_nyc))
##                           name                      host_name 
##                              0                              0 
##                        borough                  neighbourhood 
##                              0                              0 
##                       latitude                      longitude 
##                              0                              0 
##                      room_type                          price 
##                              0                              0 
##                 minimum_nights              number_of_reviews 
##                              0                              0 
##                    last_review              reviews_per_month 
##                          10052                          10052 
## calculated_host_listings_count               availability_365 
##                              0                              0

There are 10052 missing values both in last_review and reviews_per_month. Considering the information, it seems like we are unable to impute the missing values. Besides, giving it further thought, I do not think that those features are very important in interactive plotting. So, I am going to just drop those features.

ab_nyc <- ab_nyc %>% 
  select(-c(last_review, reviews_per_month))

Duplicated Data

sum(duplicated(ab_nyc))
## [1] 0

There is not any duplicated data in the dataset. So we can proceed to the visualization part. Before that, I combined the codes for cleaning the data as follows:

#ab_nyc <- read.csv("data_input/AB_NYC_2019.csv")
#ab_nyc <- ab_nyc %>% 
#  select(-c(id, host_id, last_review, reviews_per_month)) %>% 
#  rename(borough = neighbourhood_group) %>% 
#  
#  mutate(across(c(borough, neighbourhood, room_type),
#                factor)) %>% 
#  
#  mutate(room_type = recode(room_type, 
#                            "Entire home/apt" = "Entire Home/Apartment",
#                            "Private room" = "Private Room",
#                            "Shared room" = "Shared Room"))

Data Visualization

Below are the features I want to add in Shiny dashboard:

  • Bar plot that shows the top-n listings based on some filters, e.g. room type or price
  • Data table
  • Map showing all available listings in New York City

Bar Plot

When we are going to show the top-n listings, we need a metric that allows us to be able to rank them. However, in the dataset, there are no things such as review score. The only metric we can use is only number_of_reviews, which I personally think may be appropriate to use since more reviews simply means the place is more popular. It does not guarantee that the place is the best option though (some reviews might be bad), but since there are no review score, let’s just proceed with the number of reviews for now.

Although I am going to create an interactive bar plot that can change based on users’ input, I will only create a single plot here (as the base). Then, when creating the Shiny dashboard, I will change some of the mappings in the plot so that it can receive users’ input. For now, I will create a bar plot that shows top 5 private room listing under $250 in Brooklyn and Manhattan.

bar_df <- ab_nyc %>%
  filter(borough %in% c("Brooklyn", "Manhattan"),
         room_type == "Private Room",
         price <= 250) %>% 
  slice_max(number_of_reviews, n = 5)

bar_df
##                                       name host_name   borough   neighbourhood
## 1               Great Bedroom in Manhattan        Jj Manhattan          Harlem
## 2           Beautiful Bedroom in Manhattan        Jj Manhattan          Harlem
## 3             Private Bedroom in Manhattan        Jj Manhattan          Harlem
## 4 Manhattan Lux Loft.Like.Love.Lots.Look !     Carol Manhattan Lower East Side
## 5          LG Private Room/Family Friendly     Wanda  Brooklyn        Bushwick
##   latitude longitude    room_type price minimum_nights number_of_reviews
## 1 40.82085 -73.94025 Private Room    49              1               607
## 2 40.82124 -73.93838 Private Room    49              1               597
## 3 40.82264 -73.94041 Private Room    49              1               594
## 4 40.71921 -73.99116 Private Room    99              2               540
## 5 40.70283 -73.92131 Private Room    60              3               480
##   calculated_host_listings_count availability_365
## 1                              3              293
## 2                              3              342
## 3                              3              339
## 4                              1              179
## 5                              1                0
bar_plot <- bar_df %>% 
  ggplot(mapping = aes(x = reorder(name, number_of_reviews),
                       y = number_of_reviews,
                       text = glue("{name}
                                 Location: {neighbourhood}, {borough}
                                 Price: ${price}
                                 Reviews Count: {number_of_reviews}"))) +
  geom_col(fill = "#2c3e50") +
  geom_text(aes(label = number_of_reviews,
              y = number_of_reviews + 12),
          size = 3,
          col = "black") +
  labs(title = glue("Top 5 Private Room Listing under $250 in Brooklyn and Manhattan"),
       x = NULL,
       y = "Number of Reviews") +
  scale_x_discrete(labels = wrap_format(20)) +
  coord_flip() +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))
ggplotly(bar_plot, tooltip = "text") %>% 
  layout(hoverlabel = list(bgcolor = "b5e2ff"))

Bubble Map

Creating the icons and popup content.

bnb_icon <- makeIcon(
  iconUrl = "assets/home.png",
  iconWidth = 30, 
  iconHeight = 30
)

popup <- paste(sep = "",
               ab_nyc$name, "<br>",
               "Room Type: ", ab_nyc$room_type, "<br>",
               "Price: $", ab_nyc$price,"<br>",
               "Number of Reviews: ", ab_nyc$number_of_reviews
               )

Creating the map, limiting the zoom out options so the map can be more focused on New York City.

bubble_map <- leaflet(options = leafletOptions(zoomControl = FALSE,
                                               minZoom = 10)) %>% 
  setView(lng = -73.935242, lat = 40.730610, zoom = 10) %>%
  
  addTiles() %>% 
  
  addMarkers(lat = ab_nyc$latitude,
             lng = ab_nyc$longitude,
             icon = bnb_icon,
             popup = popup,
             clusterOptions = markerClusterOptions()
             ) %>%
  
  addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
  
  addProviderTiles(providers$Stamen.TonerLines,
                   options = providerTileOptions(opacity = 0.5)) %>%
  
  addProviderTiles(providers$Stamen.TonerLabels) %>% 
  addProviderTiles(providers$OpenSeaMap)

bubble_map